Skip to content

Conversation

@QiangCai
Copy link
Contributor

@QiangCai QiangCai commented Jan 3, 2016

I have closed pull request #10487. And I create this pull request to resolve the problem.

spark jira
https://issues.apache.org/jira/browse/SPARK-12340

@SparkQA
Copy link

SparkQA commented Jan 3, 2016

Test build #2307 has finished for PR 10562 at commit 4974f05.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a blank line is needed here.

@QiangCai
Copy link
Contributor Author

QiangCai commented Jan 4, 2016

I have removed some blank lines.

@srowen
Copy link
Member

srowen commented Jan 4, 2016

@QiangCai the problem isn't blank lines but whitespace at the end of your lines.

@QiangCai
Copy link
Contributor Author

QiangCai commented Jan 4, 2016

@srowen I have removed some whitespaces.

@QiangCai QiangCai closed this Jan 4, 2016
@QiangCai QiangCai reopened this Jan 4, 2016
@SparkQA
Copy link

SparkQA commented Jan 4, 2016

Test build #2310 has finished for PR 10562 at commit 639cfb2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@QiangCai
Copy link
Contributor Author

QiangCai commented Jan 5, 2016

@srowen I have no idea how to resolve this error of unit tests. Would you help me?

@srowen
Copy link
Member

srowen commented Jan 5, 2016

@QiangCai I think the test failures are unrelated. However before we can retest you'll have to rebase as there is a merge conflict now.

@QiangCai
Copy link
Contributor Author

QiangCai commented Jan 5, 2016

@srowen I have rebased from master and resolved all conflicts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed it.

@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #2325 has finished for PR 10562 at commit 3d340f7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@QiangCai
Copy link
Contributor Author

QiangCai commented Jan 5, 2016

@srowen I have found some error messages in test build log, a OutOfMemoryError exception has happened. The code in 71 line of the file AsyncRDDActions.scala is "val results = new ArrayBufferT ", because the param num(2147483638) is too large, so JVM can't allocate enough memory space.

error messages:
[info] Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.SQLQuerySuite *** ABORTED *** (31 seconds, 447 milliseconds)
[info] java.lang.OutOfMemoryError: Java heap space
[info] at scala.collection.mutable.ResizableArray$class.$init$(ResizableArray.scala:32)
[info] at scala.collection.mutable.ArrayBuffer.(ArrayBuffer.scala:47)
[info] at org.apache.spark.rdd.AsyncRDDActions$$anonfun$takeAsync$1.apply(AsyncRDDActions.scala:71)
[info] at org.apache.spark.rdd.AsyncRDDActions$$anonfun$takeAsync$1.apply(AsyncRDDActions.scala:66)
[info] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
[info] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
[info] at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
[info] at org.apache.spark.rdd.AsyncRDDActions.takeAsync(AsyncRDDActions.scala:66)
[info] at org.apache.spark.sql.SQLQuerySuite$$anonfun$132.apply$mcV$sp(SQLQuerySuite.scala:2079)
[info] at org.apache.spark.sql.SQLQuerySuite$$anonfun$132.apply(SQLQuerySuite.scala:2071)
[info] at org.apache.spark.sql.SQLQuerySuite$$anonfun$132.apply(SQLQuerySuite.scala:2071)
......

@sarutak
Copy link
Member

sarutak commented Jan 5, 2016

Why the instance of ArrayBuffer in AsyncRDDActions#takeAsync is created with initial size?
On the other hand, the instance of ArrayBuffer in RDD#take is created without initial size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a chance to modify again, please insert a white space between ) and {.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do it.

@QiangCai
Copy link
Contributor Author

QiangCai commented Jan 5, 2016

@sarutak Maybe we have found another bug. I will try to fix it.

@QiangCai
Copy link
Contributor Author

QiangCai commented Jan 5, 2016

I have removed the initial size num. The initial size will be a default value 16. It is the same to RDD#take and will be okey.

@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #2326 has finished for PR 10562 at commit e7577ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@QiangCai
Copy link
Contributor Author

QiangCai commented Jan 6, 2016

I think I have resolved this problem.

@srowen
Copy link
Member

srowen commented Jan 6, 2016

LGTM

@sarutak
Copy link
Member

sarutak commented Jan 6, 2016

Merging this into master and branch-1.6. Thanks @QiangCai !

@asfgit asfgit closed this in 5d871ea Jan 6, 2016
@sarutak
Copy link
Member

sarutak commented Jan 6, 2016

@QiangCai We have many conflicts against branch-1.6 so I'd merge this into only master for now.
If you want merge this into branch-1.6 please feel free open another PR.

@QiangCai QiangCai deleted the bugfix branch January 6, 2016 14:09
@QiangCai
Copy link
Contributor Author

QiangCai commented Jan 6, 2016

OK. I have created another PR #10619 to merge this code to branch-1.6.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change necessary? When can partsScanned go above 2B?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. partScanned cannot exceed the value of totalParts.
I'll return it to Int.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a legit problem here. Imagine totalParts is close to Int.MaxValue, and imagine partsScanned is close to totalParts. Adding p.size to it below could cause it to roll over. I think this change is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's never possible -- if we have anywhere near 2B partitions, the scheduler won't be fast enough to schedule them. As a matter of fact, if we have anywhere larger than a few millions, the scheduler will likely crash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, in practice this all but certainly won't happen. Note that this patch was already committed to master making this a Long. It doesn't hurt and is very very theoretically more correct locally. I suppose I don't think it's worth updating again, but I do not feel strongly about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to change it back since it is so little work, so this does not start a trend to change all ints to longs for no reason. Note that this also raise questions about why this value can be greater than int.max when somebody reads this code in the future.

Also @srowen even if totalParts is close to int.max, I don't think partsScanned can be greater than int.max because we never scan more parts than the number of parts available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok you were referring to partsScanned + numPartsToTry - we should just cast that to long to minimize the impact.

@rxin
Copy link
Contributor

rxin commented Jan 6, 2016

@QiangCai it would be great if you can submit a new pull request to address the comments. Thanks.

rxin added a commit to rxin/spark that referenced this pull request Jan 9, 2016
This is a follow-up for the original patch apache#10562.
asfgit pushed a commit that referenced this pull request Jan 9, 2016
This is a follow-up for the original patch #10562.

Author: Reynold Xin <rxin@databricks.com>

Closes #10670 from rxin/SPARK-12340.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants